Combination of In Silico and In Vitro Screening to Identify Novel Glutamate Carboxypeptidase II Inhibitors

Glutamate carboxypeptidase II (GCPII) is a metalloprotease implicated in neurological diseases and prostate oncology. While several classes of potent GCPII-specific inhibitors exist, the development of novel active scaffolds with different pharmacological profiles remains a challenge. Virtual screening followed by in vitro testing is an effective means for the discovery of novel active compounds. Structure- and ligand-based pharmacophore models were created based on a dataset of known GCPII-selective ligands. These models were used in a virtual screening of the SPECS compound library (∼209.000 compounds). Fifty top-scoring virtual hits were further experimentally tested for their ability to inhibit GCPII enzymatic activity in vitro. Six hits were found to have moderate to high inhibitory potency with the best virtual hit, a modified xanthene, inhibiting GCPII with an IC50 value of 353 ± 24 nM. The identification of this novel inhibitory scaffold illustrates the applicability of pharmacophore-based modeling for the discovery of GCPII-specific inhibitors.


Model 1
Model 1 was generated based on two enantiomeric inhibitors (JHU241 and JHU242) binding to the S1 site (PDB 5D29 and 5ELY) 4 ( Fig. S6-B). Models for both structures were generated and combined to a shared feature pharmacophore model using the coordinates of 5ELY 4 . After the optimization process (Fig. S7), the model consisted of eight features and nine Xvols. The S1 site is located deep inside the binding pocket next to the catalytic Zn2+ ions. Key features of this model are therefore the zinc-binding and the NI feature. The binding group is stabilized by a HBD feature with Asp387 and a HBA with Tyr552, one of the amino acids delineating the S1 site. A hydrophobic contact, the second NI position interacting with Gly534 and two HBA features with Tyr234 and Arg534 (also a part of the S1), completed the spatial arrangement (Fig.  3A). This model was highly restrictive (Ef = 1341.5, maximum value) and only found the two compounds in the dataset. Figure S6. A) Pharmacophore model 1 with compound JHU242, based on the PDB entry 5ELY 4 . Blue spheres represent the zinc ions coordinating the ligand hydroxamate moiety. Red stars represent NI pharmacophore features, red arrows HBAs, green arrow HBDs, and the yellow sphere marks a HC B) Structures of inhibitors of the S1 subset. Figure S7. A) automatically generated structure-based pharmacophore model 1 (based on S1 site subset ligands shown in figure S1) B) the optimized model 1.
The originally generated model 1 consisted of 13 features (3 NIs, 7 HBAs, 1 HBD, 1 zinc binding and 1 HC) and 11 Xvols. The model optimization led to removal of the NI feature interacting with Arg534 and 4 HBA features as well (1 interacting with Arg534, 1 with Asn519 and 2 with water molecules). Moreover, 2 Xvols were removed during the model optimization (one placed on water molecule and one on nitrogen of Asn519). This resulted in a model depicted on Figure S7-B. Model 1 consists of 8 features (2 NIs, the first coordinating the zinc atoms and the second is placed next to Tyr234, 3 HBAs hydrogen bonding with Arg534, Tyr234 and Tyr552, 1 HBD interacting with Asp387, one HC and 1 zinc binding feature) and 9 Xvols.

Model 2
Model 2 ( Fig S8) was generated based on three inhibitors, namely (2S)-2-[(N-acetyl-L-alphaaspartyl)amino]nonanoic acid (pdb entry 3SJE 5 ), (2S)-2-[(N-acetyl-L-alphaaspartyl)amino]octanoic acid (3SJG 5 ), and N-acetyl-aspartyl-methionine (3SJX 5 ). The three automatically generated structure-based models were combined 5 . The resulting shared feature model was optimized to enrich active compounds over decoys. The final model consisted of nine features and 16 Xvols. Two zinc binding features anchored the molecule to the catalytic center and a feature complex consisting of an NI and two HBA features connected the molecule to the arginine patch (Arg534, Arg536). In the S1' site the pharmacophore contained another NI, interacting with Arg210 and two HBAs, one engaging Arg210 and the second a conserved water molecule bound to Asn257, Trp381, and Glu425. The model correctly identified ten out of twelve actives for this binding site and 40 decoys. The enrichment metrics are shown in Table 1. This model was less restrictive than model 1 and covered mainly smaller molecular entities.  Figure S9. A) automatically generated structure-based pharmacophore model 2 (based on small S1-S1' site subset ligands shown in figure S3) B) the optimized model 2.
The originally generated model 2 consisted of 17 features (2 NIs, 10 HBAs, 1 HBD, 3 zinc binding and 1 HC) and 21 Xvols. The following features were removed during the model optimization process. 4 HBAs (1 forming interaction with Tyr700, 1 with Tyr552 and 2 with Arg536), the HBD with Asn519, 1 zinc binding feature, and the HC feature. One HBA feature with water molecule HOH1964 was removed, while HBA feature with HOH1808 was kept since this feature significantly increased model performance. Moreover,

Model 3
Structure-based pharmacophore model 3 ( Figure S10) was built based on two GCPII ligands, L-serine-Osulfate (PDB entry 2PVV) 6 and quisqualic acid (PDB entry 2JBK) 7 . Two structure-based models were built on both complexes, overlaid (reference: 2JBK) and a shared feature pharmacophore model was generated. Such automatically generated model consisted of 8 features (1 NI, 1 PI, 3 HBAs, and 3 HBDs) and 8 Xvols. After removing 1 HBA and 2 HBDs interacting with water molecules, the model was optimized to map these two ligands only. The resulting model 3 consists of 5 features (1 NI interacting with Arg210, 1 PI interacting with Glu424, 2 HBAs interacting with Arg210 and Tyr700, and 1 HBD interacting with Glu424) and is depicted on Figure S8-B. Figure S10. A) Shared feature structure-based pharmacophore model 3 representing two inhibitors of the S1'-subset based on PDB entries 2JBK and 2PVV (two ligands of the S1'-subset shown in figure S2). B) The optimized model 3.

Model 4
The description and depiction of the model is shown in the main manuscript. Figure S11. A) automatically generated structure-based pharmacophore model 4 for the GCPII inhibitors of the S1-S1' subset ( figure S4) and B) the optimized model 4.
The automatically generated model 4 consisted of 15 features (12 HBAs and 3 NIs) and 22 Xvols. The NI feature interacting with Lys699 was removed in the optimization process. 7 HBAs were removed as well, 1 interacting with Asn257, 1 with Tyr552, 1 with Tyr700 and 4 with water molecules. The optimized model 4 consists of 7 features (2 NIs, one interacting with Arg534 and Arg536 and the second with Arg210 and His553, 5 HBAs, 1 forming an interaction with Arg534, 2 with Arg536, 1 with Lys699 and 1 with Arg210) and 19 Xvols.

Models 5 and 6
A shared-feature ligand-based model based on the same compounds as the structure-based model 4 ( Fig. 3B) was created to allow for more flexibility in the hydrogen bonding angles. Two models resulted from the optimization process: Model 5 ( Fig. S12-A) contained 37 Xvols, two NIs and four HBA features that were located at similar locations as in model 4 but were not constrained by specific binding angles. An additional HBD feature was introduced that interacted with Glu424 or Gly518 residues of the enzyme. Model 6 was very similar to the model 5, but lacked one HBA feature and contained only 25 Xvols. Model 6 ( Fig. S12-B) was slightly less restrictive than model 5 and allowed us to find all 32 actives in the S1&S1'large dataset in combination with models 5 and model 4. The enrichment metrics of both models are shown in Table 1. Figure S12. A) Model 5: Shared-feature ligand-based pharmacophore model for the S1&S1'-subset with ligand RNA 2-49-1. Red and green spheres mark HBA and HBD features respectively. B) Model 6: Shared feature ligand based pharmacophore model for the S1&S1'-subset with ligand RNA 2-49-1. Figure S13. A) automatically generated ligand-based pharmacophore model 5 for the GCPII inhibitors of the S1-S1' subset ( figure S4) B) the optimized model 5.
The automatically generated model that served as a basis for models 5 and 6 consisted of 7 features (4 HBAs, 2 NIs, and 1 HBD) and 37 Xvols. The amount of model features and tolerances were subsequently optimized and the best performance achieved when the HBA feature interacting with Arg210 and His553 was set to optional. Since optional feature increases computational costs of the screening process, we decided to split this model into two, model 5 where this feature was obligatory ( Figure S10-B) and model 7 where this feature was removed ( Figure S14-B). The optimized model 5 consists of the same 7 features and 37 Xvols.
Overlay of this model with model 4 indicates that one NI feature of this ligand-based model would possibly interact with Arg534 and Arg536 of GCPII while the second NI with Lys699. The NI feature interacting with Arg210 and His553 in model 4 was not generated in this ligand-based model. The HBD would interact with Gly518. The HBA present in model 5 but not in model 6 would potentially interact with Arg210 and His553. The second HBA feature with Lys699 while the third and fourth HBA features, being close to each other interact with Arg534, Arg536 and Asn519. Figure S14. A) automatically generated ligand-based pharmacophore model 6 for the GCPII inhibitors of the S1-S1' subset B) the optimized model 7.

Model 7
The subset spanning S1, S1'& ABS encompassed eight very large inhibitors (Fig. S5). Structure-based model 7 was based on the co-crystallized inhibitor ARM-P4 and consisted in total of 9 features and 29 Xvols (Fig.  S15-A). Two of the NI features interacted with Arg210 and Lys699 of the S1' pocket, while the third NI bound to Arg534 of the S1 pocket. Furthermore, the six HBA features interacted with residues of either specificity pocket. The Xvol coat allowed for the molecule to fill all three pockets of the binding site. The model found all 8 inhibitors of the training set and 14 decoys leading to an Ef = 122.2 (Table 1). Figure S15. A) Model 7: Structure-based pharmacophore model for the S1-S1'& ABS-subset created based on PDB entry 2XEG 8 visualized with the ligand ARM-P4. B) Antibody recruiting GCPII inhibitor ARM-P4 used for model generation. Figure S16. A) automatically generated structure-based pharmacophore model 7 for the GCPII inhibitors of the S1-S1' subset (figure S5) based on PDB ID 2XEG and B) the optimized model 7.
The automatically generated model 7 consisted of 28 features (3 NIs, 19 HBAs, 4 HBDs, 1HC and 1 aromatic) and 30 Xvols. 8 HBAs interacting with water molecules were removed during model optimization. Similarly, 5 more HBAs were removed, 1 interacting with Tyr552, 2 with Arg536, 1 with Ser501 and 1 with Arg210. The 2 HBDs interacting with Gly518 and 1 with Ser513 were removed as well as HBD interacting with water molecule. The aromatic feature as well as the hydrophobic contact were removed as well.
The optimized model 7 consists of 9 features (3 NIs and 6 HBAs) and 29 Xvols (Fig S15). One of the NI features interacts with Arg534 and Arg536, the other with Arg210 and His553 and the last one with Lys699. 1 HBA forms an interaction with Tyr700, 1 with Arg210, 1 with Lys699 and 1 with Asn257, 1 with Arg534 and 1 with Asn519. Figure S17. A) Shared-feature-ligand-based pharmacophore model 8 for the S1-S1'&ABS-subset created based on all eight inhibitors co-crystallized in this binding site shown with folyl-gamma-L-glutamic acid. Figure S18. A) automatically generated ligand-based pharmacophore model 8 for the GCPII inhibitors of the S1-S1' subset ( figure S5)  Overlay of this model with model 7 indicates that NI of this ligand-based model would possibly interact with Lys699 of GCPII. 1 HBA feature shows a potential to interact with Glu424, 1 with Asn257, 1 with the backbone nitrogen of Lys207 while the interaction partner of the last HBA remains unknown. Table S1. Enrichment metrics of the pharmacophore models 1,2 and 4-8. All 54 GCPII active compounds regardless of their binding mode were considered when calculating the parameters. The database of decoys was composed of 2681 compounds (TP -true positive; FP -false positive; TN -true negative; FN -false negative; YoA -yield of actives; EF -enrichment factor; AUC -area under the curve).  Table S2. Overview of the tested virtual hits. Percent inhibition of GCPII activity of the 50 experimentally tested virtual hits. The pharmacophore model that identified the hits is also indicated. Compounds were tested at concentration of 20 M with enzymatic activity assay using highly purified human recombinant GCPII and fluorescently labeled Glu-Glu dipeptide. SD -standard deviation. Compounds with more than 50% inhibition are marked in green, compounds with more than 25% inhibition are marked in yellow.

Selection of virtual SPECS hits for experimental activity testing
The 82 SPECS hits that mapped our models were narrowed down to 50 compounds for which experimental inhibitory activity against GCPII was evaluated. To this end, following steps were used to filter SPECS hits.
1. Compounds resembling natural GCPII substrate were excluded (all compounds containing glutamate with an amide bond on N terminal). 2. SciFinder similarity search was performed for all virtual hits. If the hit was already mentioned/tested against GCPII, it was discarded. 3. Compounds to test were selected with emphasis on their structural diversity that was assessed visually. 4. Since consensus hits (virtual hits in more than one model) are supposed to have a higher probability of being true positives than virtual hits of a single model, we aimed to experimentally test all consensus hits except if they failed criterion 1, 2, or 3. 5. Compounds for experimental activity evaluation were selected aiming to represent every pharmacophore model by the same amount of virtual hits (where possible). If the model mapped too little SPECS compounds, all those satisfying the previous two criteria were tested.
Three of the four consensus hits identified concomitantly by models 2, 5 and 6 were experimentally tested on their GCPII inhibitory activity. Compound AN-329/40945305 (we refer the reader to www.specs.net) was not selected for experimental testing due to its striking similarity to 1 ( Figure S13) and due to SciFinder (scifinder.cas.org) similarity search uncovered two compounds that were already tested against GCPII.
From the 19 compounds that were mapped by two of our pharmacophore models, 16 were selected for experimental activity testing ( Figure S13). The three excluded compounds were all identified by both models 2 and 7. Compound AE-562/40806920 was removed since its exact search in SciFinder returns more than 12K results rendering this compound highly likely to be already tested on many targets. Compound AG-690/12245692 was excluded due to its high similarity to two other double consensus hits, 12 ( Figure S13) and 13 ( Figure S13). Compound AN-584/43503783 was not experimentally tested because of its small molecular weight of 171.152.
All single hits of models that found less than 10 compounds (models 5, 7 and 8, Table S3) were selected for testing. Model 4 identified 15 compounds. They were all experimentally tested except for compound AG-690-33064013 due to its striking similarity to compound 43 and to 50 (AG-205/33006036) Model 2 and model 6 mapped the most SPECS hits of all the models generated, 25 and 26, respectively. Selection of SPECS hits for experimental testing of these models was two-step process. First, hits were ordered according to their descending pharmacophore fit score. Second, the highest-ranking compounds were selected for testing. However, if the compound was visually similar to compounds already selected for testing from other models, or to higher ranked compounds of these models, it was discarded. Summary of the compounds tested from each model is shown Table S3 and Figure S13.  Figure S19. 50 compounds selected for experimental activity testing against GCPII. Compounds within the hitlist of a specific model or within the consensus hitlist of specific models are numbered according to their pharmacophore fit score in descending order. For consensus hitlists, pharmacophore fit score of the most restrictive model (based on EF) is taken into consideration.

Generation of decoys dataset
To generate a set of decoys specific for GCPII, physico-chemical properties of all 54 GCPII active compounds (logP, molecular weight, number of hydrogen bond donating and accepting groups and number of rotatable bonds) were calculated using small molecules module in Discovery Studio 9 . Their statistical analysis is shown in Table S4. As clearly visible from the table, GCPII inhibitors are rather polar compounds mostly possessing many HBA and HBD features. Only two compounds in the dataset of GCPII active compounds have molecular weight higher than 745 and three lower than 280. Only three compounds comprise more than 17 HBAs, only two more than 8 HBDs, only four more than 23 rotatable bonds (Table S4). Therefore, for the construction of decoys, only compounds with properties within the boundaries of our GCPII-adapted Lipinski-like filter (Table SI4) were selected.
The set of decoys was generated by stepwise filtering of more than 2 million bioactive molecules publicly available in the ChEMBL database 10 . Since many compounds in ChEMBL database contain experimental logP values that we wanted to retain, we first filtered out all compounds with missing logP value narrowing the database down to approximately 1.4 million compounds. In the second, our GCPII-adapted Lipinskilike filter reduced the ChEMBL database into 97 400 compounds that, in essence, have similar physicochemical properties to GCPII active compounds. Application of in-house Pipeline Pilot protocol 12 clustered these compounds into 2700 clusters based on their Tanimoto similarity 13 using FCFP4 fingerprint 14,15 . One molecule from each cluster was selected for the decoys generating a decoy set of 2700 compounds.

PAINS analysis of hit compounds
Test compounds were not excluded based on PAINS filtering, we did however subject them to screening with a PAINS filter to discuss any possible problems with assay interference.
The 50 tested hits were screened with a Pan assay interference compounds (PAINS) remover tool (PAINS 1, 2 and 3 filter) implemented in Canvas 16,17 . The remover tool hit on compounds 4, 26 and 27 ( Figure  S13), which were all tested as inactive in our experimental setup. The only active compound found by the filtering tool was compound 17 ( Figure SI13), which was highlighted as a PAINS compound because it contains a quinone structure. Quinones are known for exerting unspecific redox activity that often leads them to appear as active in assays but renders their effects useless in a cellular context. To test for unspecific effects the compounds were also tested on HDAC10 and compound 17 did indeed also exert an effect. We therefore believe the effect on GCPII is unspecific and should not be further pursued.
Compound 22, a weakly active compound that represents a new symmetric scaffold previously unreported for GCPII inhibition was also highlighted as a PAINS compound because of its two 2-amino-3carbonylthiophene groups that might cause protein thiol oxidation 18 . However, GCPII does not contain any thiol group in the proximity of the catalytic site. Moreover, this group only rarely triggers non-specific false positive readout in activity assays 16 , compound 22 bears an additional aromatic substitution at position 4 of the thiophene ring, and specific inhibitors of other proteins bearing this functional group exist, such as inhibitors of tubulin assembly 19 , inhibitors of tyrosin kinase FLT3 20 or Hepatitis C virus inhibitors 21 .

Docking Parameters for compound 46
The GCPII structure 6HKZ 22 was downloaded from PDB 2 and prepared for docking using the protein preparation wizard implemented in the Schroedinger 2020-3 small molecule drug discovery suite (SMDD) 23 using standard settings. All water molecules were deleted from the structure. The protonation states of the amino acids were calculated using Epik at pH 7 with standard settings. The protein-ligand complex was minimized with an OPLS3 force field in three steps. Initially, only hydrogens were allowed to move. In a second minimization step both hydrogens and heavy atoms were allowed to change position up to a maximum of 0.3 Å. Finally, a fully flexible minimization of the complex within the protein preparation wizard was performed. The docking simulation was conducted using the Glide module of Schrodinger SMDD suite 24 in XP (extra precision) mode under default settings (flexible ligand sampling, OPLS3e force field, no constraints). Ten poses were calculated per ligand. To validate the docking workflow a redocking of the co-crystallized ligand of 6HKZ was performed under these settings, resulting in an RMSD of 2.57 Å.
Glide XP 25 was employed as a scoring function, resulting in a score of -7.465 kcal/mol for compound 46 in the pose shown in Fig. 11.

GCPII inhibitors with subnanomolar activities
All known highly potent GCPII inhibitors share common structural motifs: A glutamate moiety binding to the S1' pocket and either a urea or phosphonate groups that act as zinc binding locations.